Why visualize data?

Good visualizations can give:

  • Powerful summaries of the underlying data

  • Communicate insights often to audiences who do not have the same luxury of spending so much time with the data as you do.

As a Data analyst/ Scientist, it’s your responsibility to give the necessary high level summaries or takeaways in any data visual you create.

Some Features of Good Visualizations

  • Clear on what they’re communicating

  • Well defined axis, with the right scaling and labels

  • Good choice of colors and anotations (visually appealing)

  • Less is more

Some Features of Bad Visualizations

  • Cluttered, too much going on in the chart with no clear communication goal

  • Truncating axes to start at non-zero values which distorts interpretation

  • Poor choice of colors

  • Unnecessary 3D-fying

Our data for today - Netflix Movies & TV Shows

##    show_id              type              title             director        
##  Length:8807        Length:8807        Length:8807        Length:8807       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      cast             country           date_added         release_year 
##  Length:8807        Length:8807        Length:8807        Min.   :1925  
##  Class :character   Class :character   Class :character   1st Qu.:2013  
##  Mode  :character   Mode  :character   Mode  :character   Median :2017  
##                                                           Mean   :2014  
##                                                           3rd Qu.:2019  
##                                                           Max.   :2021  
##     rating            duration          listed_in         description       
##  Length:8807        Length:8807        Length:8807        Length:8807       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
## 
Data summary
Name netflix
Number of rows 8807
Number of columns 12
_______________________
Column type frequency:
character 11
numeric 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
show_id 0 1.00 2 5 0 8807 0
type 0 1.00 5 7 0 2 0
title 0 1.00 1 104 0 8807 0
director 2634 0.70 2 208 0 4528 0
cast 825 0.91 3 771 0 7692 0
country 831 0.91 4 123 0 748 0
date_added 10 1.00 11 18 0 1714 0
rating 4 1.00 1 8 0 17 0
duration 3 1.00 5 10 0 220 0
listed_in 0 1.00 6 79 0 514 0
description 0 1.00 61 248 0 8775 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
release_year 0 1 2014.18 8.82 1925 2013 2017 2019 2021 ▁▁▁▁▇

Some Bad Visualizations

Example 1

## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()

What’s wrong with that plot?

The visualization is bad because:

  • It’s vague, putting together all movie ratings does help the audience identify what you’re trying to communicate.

  • The rating categories are too many. Remember, good visuals give high level summaries (less is more)

  • The pie chart used here is not the best tool for comparing multiple categories.

  • Pie charts also make it difficult for your audience to judge the relative sizes of the slices.

    Let’s look at another.

Example 2

Examples of Good Visualizations

Example 1

Example 2

Visit this github repo for the code.